Practical 8: Applications of text mining and NLP¶

Javier Garcia-Bernardo¶

logo

Applied Text Mining - Utrecht Summer School¶

In this practical you will be answering a research question or solving a problem. For that you will create a pipeline for classification or clustering.

All the data is processed and can be found on the github repository.

Here are some proposed research questions:

Classification¶

RQ1: Identification of fake news, hate speech or spam + Interpretability of results:¶

  • Data: https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset or https://github.com/aitor-garcia-p/hate-speech-dataset (https://paperswithcode.com/dataset/hate-speech) or https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection
  • Goal: Evaluate performance of different methods and interpret the results using LIME

RQ2: Evaluate the importance of metadata. Create a classification system to identify the movie genre using and excluding metadata:¶

  • Data: https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots
  • Options:
    • Create two classifications systems, one using only metadata, one using only text. Stack them to create the best model: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.StackingClassifier.html
    • Use the functional API of Keras to create one model that handles both types of inputs: https://pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/
  • Goal: Evaluate performance and interpret the results using LIME

Clustering:¶

RQ3: Create a recommendation system for movies based on their plot:¶

  • Data: https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots
  • Output: What are the closest movies to "The Shawshank Redemption", "Goodfellas", and "Harry Potter and the Sorcerer's Stone"?

RQ4: Cluster headlines using word embeddings:¶

  • Data: https://www.ims.uni-stuttgart.de/en/research/resources/corpora/goodnewseveryone/ (https://aclanthology.org/2020.lrec-1.194.pdf)
  • Do the clusters correlate to emotions or media sources?

You can come up with your own research question using any dataset on text analysis, e.g. from:

  • UCI repository: https://archive.ics.uci.edu/ml/datasets.php?format=&task=&att=&area=&numAtt=&numIns=&type=text&sort=nameUp&view=table
  • Papers with code repository: https://paperswithcode.com/datasets?mod=texts&page=1
  • Kaggle (code examples are often included): https://www.kaggle.com/datasets?tags=13204-NLP (but given the time restrictions, choosing one of the above is recommended)
In [1]:
# path to the data
path_data = "./data/"

# How to read data (We cleaned it for you)
# data_rq1_fake = pd.read_csv(f"{path_data}/rq1_fake_news.csv.gzip",sep="\t",compression="gzip")
# data_rq1_hate_speech = pd.read_csv(f"{path_data}/rq1_hate_speech.csv.gzip",sep="\t",compression="gzip")
# data_rq1_youtube = pd.read_csv(f"{path_data}/rq1_youtube.csv.gzip",sep="\t",compression="gzip")
# data_rq2_3 = pd.read_csv(f"{path_data}/rq2_3_wiki_movie_plots.csv.gzip",sep="\t",compression="gzip")
# data_rq4 = pd.read_csv(f"{path_data}/rq4_gne-release-v1.0.csv.gzip",sep="\t",compression="gzip")
# data_rq1_fake.shape, data_rq1_hate_speech.shape, data_rq1_youtube.shape, data_rq2_3.shape, data_rq4.shape
In [2]:
# Data wrangling
import pandas as pd
import numpy as np

# Machine learning tools 
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import LogisticRegression


# Interpretable AI
#!pip install lime
from lime.lime_text import LimeTextExplainer

RQ1: Identification of hate speech¶

  • Data on hate speech: https://github.com/aitor-garcia-p/hate-speech-dataset (https://paperswithcode.com/dataset/hate-speech)
  • Data on fake vs real news: https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset
  • Data on youtube spam messages: https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection

We provide code for the first dataset. Your goal is to (1) improve the classifier by using a more advanced method (2)

Data: Dataset of hate speech annotated on Internet forum posts in English at sentence-level. The source forum in Stormfront, a large online community of white nacionalists. A total of 10,568 sentence have been been extracted from Stormfront and classified as conveying hate speech or not

Step 1: Read data and create train-test split¶

In [3]:
df = pd.read_csv(f"{path_data}/rq1_hate_speech.csv.gzip",sep="\t",compression="gzip", index_col=0)
df["label"] = df["label"].map({"hate": 1, "noHate": 0})
df = df[["text","label"]]
df = df.dropna()
print(df.shape)
df.head()
(10703, 2)
Out[3]:
text label
file_id
12834217_1 As of March 13th , 2014 , the booklet had been... 0.0
12834217_2 In order to help increase the booklets downloa... 0.0
12834217_3 ( Simply copy and paste the following text int... 0.0
12834217_4 Click below for a FREE download of a colorfull... 1.0
12834217_5 Click on the `` DOWNLOAD ( 7.42 MB ) '' green ... 0.0
In [4]:
# # read data
# df = pd.read_csv(f"{path_data}/rq1_fake_news.csv.gzip",sep="\t",compression="gzip", index_col=0)
# df.rename(columns={"title": "text", "Fake":"label"})
# # descriptive stats
# df.groupby("Fake").count()
In [5]:
# split into train and test
X_train, X_test, y_train, y_test = train_test_split(df["text"].values, df["label"].values, test_size=0.33, random_state=42)

Step 2: Create pipeline and hyperparameter tuning¶

Create a pipeline that vectorizes the text and transform it using TF-IDF, and classifies the news titles using LogisticRegression.

In [6]:
# Pipeline
pipe = Pipeline([
    ('vectorizer', TfidfVectorizer(stop_words='english',  #remove stopwords
                                   lowercase=True, #convert to lowercase
                                   token_pattern=r'(?u)\b[A-Za-z][A-Za-z]+\b')), #tokens of at least 2 characters
    ('clf', LogisticRegression(max_iter=10000, dual=False, solver="saga")) #logistic regression
])


# Parameters to hyptertune
param_grid = dict(vectorizer__ngram_range=[(1,1), (1,2), (1,3)], # creation of n-grams
                  vectorizer__min_df=[1, 10, 100], # minimum support for words
                  clf__C=[0.1, 1, 10, 100], # regularization
                  clf__penalty=["l2","l1"]) # type of regularization

# Run a grid search using cross-validation to find the best parameters
grid_search = GridSearchCV(pipe, param_grid=param_grid, verbose=True, n_jobs=-1)

# to speed it up we find the hyperparameters using a sample, and fit on the entire datast later
grid_search.fit(X_train[:1000], y_train[:1000])
    
# best parameters, score and estimator
print(grid_search.best_params_)
print(grid_search.best_score_)
Fitting 5 folds for each of 72 candidates, totalling 360 fits
{'clf__C': 10, 'clf__penalty': 'l2', 'vectorizer__min_df': 1, 'vectorizer__ngram_range': (1, 2)}
0.893
/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/model_selection/_validation.py:372: FitFailedWarning: 
120 fits failed out of a total of 360.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
120 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 681, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/pipeline.py", line 390, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/pipeline.py", line 348, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/joblib/memory.py", line 349, in __call__
    return self.func(*args, **kwargs)
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/pipeline.py", line 893, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/feature_extraction/text.py", line 2077, in fit_transform
    X = super().fit_transform(raw_documents)
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/feature_extraction/text.py", line 1347, in fit_transform
    X, self.stop_words_ = self._limit_features(
  File "/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/feature_extraction/text.py", line 1179, in _limit_features
    raise ValueError(
ValueError: After pruning, no terms remain. Try a lower min_df or a higher max_df.

  warnings.warn(some_fits_failed_message, FitFailedWarning)
/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/sklearn/model_selection/_search.py:969: UserWarning: One or more of the test scores are non-finite: [0.892 0.892 0.892 0.892 0.892 0.892   nan   nan   nan 0.892 0.892 0.892
 0.892 0.892 0.892   nan   nan   nan 0.892 0.892 0.892 0.891 0.891 0.891
   nan   nan   nan 0.889 0.892 0.892 0.889 0.889 0.889   nan   nan   nan
 0.891 0.893 0.891 0.884 0.884 0.884   nan   nan   nan 0.882 0.866 0.851
 0.883 0.883 0.883   nan   nan   nan 0.885 0.891 0.892 0.882 0.882 0.882
   nan   nan   nan 0.88  0.873 0.861 0.88  0.88  0.88    nan   nan   nan]
  warnings.warn(
In [7]:
# print resutls
results = pd.DataFrame(grid_search.cv_results_)
results.sort_values(by="mean_test_score", ascending=False).head(10)
Out[7]:
mean_fit_time std_fit_time mean_score_time std_score_time param_clf__C param_clf__penalty param_vectorizer__min_df param_vectorizer__ngram_range params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
37 0.076167 0.002831 0.003030 0.001542 10 l2 1 (1, 2) {'clf__C': 10, 'clf__penalty': 'l2', 'vectoriz... 0.895 0.900 0.895 0.885 0.89 0.893 0.005099 1
0 0.035690 0.004285 0.002435 0.000704 0.1 l2 1 (1, 1) {'clf__C': 0.1, 'clf__penalty': 'l2', 'vectori... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
1 0.057556 0.004545 0.004739 0.001589 0.1 l2 1 (1, 2) {'clf__C': 0.1, 'clf__penalty': 'l2', 'vectori... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
28 0.098490 0.021747 0.004665 0.003109 1 l1 1 (1, 2) {'clf__C': 1, 'clf__penalty': 'l1', 'vectorize... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
56 0.174241 0.005164 0.004954 0.002020 100 l2 1 (1, 3) {'clf__C': 100, 'clf__penalty': 'l2', 'vectori... 0.895 0.885 0.905 0.885 0.89 0.892 0.007483 2
20 0.077374 0.010531 0.005199 0.003462 1 l2 1 (1, 3) {'clf__C': 1, 'clf__penalty': 'l2', 'vectorize... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
19 0.059178 0.007435 0.004714 0.002201 1 l2 1 (1, 2) {'clf__C': 1, 'clf__penalty': 'l2', 'vectorize... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
18 0.028727 0.002456 0.003007 0.001460 1 l2 1 (1, 1) {'clf__C': 1, 'clf__penalty': 'l2', 'vectorize... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
14 0.017252 0.005573 0.004315 0.003970 0.1 l1 10 (1, 3) {'clf__C': 0.1, 'clf__penalty': 'l1', 'vectori... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
13 0.015535 0.004409 0.002851 0.001719 0.1 l1 10 (1, 2) {'clf__C': 0.1, 'clf__penalty': 'l1', 'vectori... 0.895 0.895 0.890 0.890 0.89 0.892 0.002449 2
In [8]:
# Use the best parameters in the pipe and fit with the entire dataset
pipe = pipe.set_params(**grid_search.best_params_)
clf_best = pipe.fit(X_train, y_train)
In [9]:
# print vocabulary size
print(len(clf_best["vectorizer"].get_feature_names_out()))

#vocabulary
#clf_best["vectorizer"].vocabulary_

# the best score achieved
print(clf_best.score(X_train, y_train))
# the best score achieved
print(clf_best.score(X_test, y_test))
53376
0.9993027471761261
0.8958097395243488

Step 3: Interpretation of results¶

In [10]:
# Add predicitons to dataframe
df["predicted"] = clf_best.predict(df["text"])
df["predicted_prob_fake"] = clf_best.predict_proba(df["text"])[:,1]
df
Out[10]:
text label predicted predicted_prob_fake
file_id
12834217_1 As of March 13th , 2014 , the booklet had been... 0.0 0.0 0.017513
12834217_2 In order to help increase the booklets downloa... 0.0 0.0 0.018836
12834217_3 ( Simply copy and paste the following text int... 0.0 0.0 0.012749
12834217_4 Click below for a FREE download of a colorfull... 1.0 1.0 0.692457
12834217_5 Click on the `` DOWNLOAD ( 7.42 MB ) '' green ... 0.0 0.0 0.016740
... ... ... ... ...
33676864_5 Billy - `` That guy would n't leave me alone ,... 0.0 0.0 0.057341
33677019_1 Wish we at least had a Marine Le Pen to vote f... 0.0 0.0 0.048650
33677019_2 Its like the choices are white genocide candid... 0.0 0.0 0.040065
33677053_1 Why White people used to say that sex was a si... 1.0 0.0 0.112943
33677053_2 Now I get it ! 0.0 0.0 0.042926

10703 rows × 4 columns

Interpretation of coefficients in the linear model¶

We can use the coefficients of the Logistic regression

In [11]:
# Extract the coeficients from the omdel
coefs = pd.DataFrame([clf_best["vectorizer"].get_feature_names_out(), 
                      clf_best["clf"].coef_[0]]).T
coefs.columns = ["gram","coef"]
In [12]:
# top words influencing fake
display(coefs.sort_values(by="coef", ascending=False).head(10))

# top words influencing true
display(coefs.sort_values(by="coef", ascending=True).head(10))
gram coef
31434 negroes 8.126356
4218 black 7.268875
26064 liberals 6.232499
18671 groid 6.174792
15506 filth 6.048523
40573 scum 6.042985
1723 ape 5.95158
1732 apes 4.900917
686 africa 4.817527
30570 mud 4.78217
gram coef
53076 youtube -3.435731
39652 said -2.728514
52725 year -2.448316
46099 thanks -2.222029
46670 thread -2.207491
30729 music -2.145789
27704 lot -2.082198
8422 comes -2.038594
19172 hair -2.015077
30983 nationalist -1.957227

Interpretation of coefficients using LIME (Local Interpretable Model-Agnostic Explanations)¶

LIME modifies the text to understand the impact of each word to the predictions.

In [13]:
# Find some extreme examples
less_fake = df.sort_values(by="predicted_prob_fake").head(1).values[0][0]
most_fake = df.sort_values(by="predicted_prob_fake").tail(1).values[0][0]
df_confused = df.loc[df["label"] != df["predicted"]]
pred_fake_not_fake = df_confused.loc[df_confused["label"]==0].sort_values(by="predicted_prob_fake").tail(1).values[0][0]
pred_not_fake_fake = df_confused.loc[df_confused["label"]==1].sort_values(by="predicted_prob_fake").head(1).values[0][0]
pred_50_50 = "She says the class is out of control and the kids are unteachable , and the black administration does not support her "

print("Least hate: ", less_fake)
print("Most hate: ", most_fake)
print("Predicted very hate but not hateful: ", pred_fake_not_fake)
print("Predicted very innocuous but hateful: ", pred_not_fake_fake)
print("Predicted 50/50: ", pred_50_50)
Least hate:  - YouTube
Most hate:  Look what happens when Whites leave black Countries alone to do what they do naturally The blacks in White Countries today should be on their knees thanking Whites for trying to civilize them
Predicted very hate but not hateful:  Too many whites think they deserve what negroes dish out because of guilt .
Predicted very innocuous but hateful:  https://www.stormfront.org/forum/t1020784/ https : //www.stormfront.org/forum/t102 ... ghlight = sweden https : //www.stormfront.org/forum/t102 ... ghlight = sweden https : //www.stormfront.org/forum/t101 ... ghlight = sweden https : //www.stormfront.org/forum/t101 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden God save them .....
Predicted 50/50:  She says the class is out of control and the kids are unteachable , and the black administration does not support her 
In [14]:
# start the explainer
explainer = LimeTextExplainer(class_names = ["Innocuous", "Hateful"], bow=False)

# shows the explanation for our example instances
for text in [less_fake, most_fake, pred_fake_not_fake, pred_not_fake_fake, pred_50_50]:
    exp = explainer.explain_instance(text, 
                                     clf_best.predict_proba, 
                                     num_features = 10,
                                    num_samples = 1000)
    exp.show_in_notebook(text=text)
    print(exp.as_list())
    print("-"*100)
[('YouTube', -0.008323447828899103)]
----------------------------------------------------------------------------------------------------
[('black', 0.1530117030774071), ('leave', 0.09460229211793703), ('Whites', 0.08440702391812448), ('blacks', 0.06747991458238094), ('Whites', 0.0663042768418059), ('today', -0.056455629784203834), ('knees', -0.05531198997969092), ('Countries', 0.05215418416881412), ('Countries', 0.04109886367418432), ('happens', 0.025597988325753687)]
----------------------------------------------------------------------------------------------------
[('negroes', 0.5153545730863675), ('whites', 0.16865803264257337), ('guilt', 0.04260701393581915), ('think', -0.020238036445078867), ('many', -0.014668719004717558), ('because', 0.011515653045518066), ('out', 0.008668108104411673), ('Too', -0.00800020931931476), ('they', -0.006445943447796432), ('deserve', 0.005050840088067402)]
----------------------------------------------------------------------------------------------------
[('www', -0.0034857808931292185), ('www', -0.0032015608771622565), ('www', -0.0028891902361937527), ('sweden', 0.0017379058465679586), ('sweden', 0.0013248169330744022), ('sweden', 0.0013241872009841257), ('sweden', 0.0012905366943266289), ('sweden', 0.0011741381828446085), ('sweden', 0.0009545469616767627), ('sweden', 0.0003763950699854398)]
----------------------------------------------------------------------------------------------------
[('black', 0.4121239839237728), ('control', 0.17150454924079028), ('administration', -0.12989071474318323), ('class', -0.08985109690318371), ('kids', -0.057534661043636165), ('does', -0.04198309615168351), ('says', 0.022866222436935202), ('support', 0.01404376744025586), ('the', 0.0062408285781966775), ('unteachable', 0.00552002712867932)]
----------------------------------------------------------------------------------------------------
In [15]:
exp = explainer.explain_instance("I believe Dutch people have inferior food and they should be colonized by Belgium", 
                                 clf_best.predict_proba, 
                                 num_features = 10,
                                num_samples = 1000)
exp.show_in_notebook(text=text)
print(exp.as_list())
print("-"*100)
[('people', -0.009226299460579325), ('food', 0.008532091914201272), ('inferior', 0.007381136867985198), ('Belgium', -0.0032930331397772604), ('believe', -0.001058917896783231), ('they', 0.0001833517378424935), ('by', -0.00013794883401532004), ('and', 0.00013238985184920658), ('I', -0.00011831954157026918), ('Dutch', -0.00010376687225336656)]
----------------------------------------------------------------------------------------------------

Now it's your turn.¶

Either:

  • Adapt RQ1 using different models (e.g. a CNN, as shown below) or data (either the ones described under RQ1, or any other)
  • Or start on a different RQ
In [16]:
#!pip install scikeras
from scikeras.wrappers import KerasClassifier
#from keras.wrappers.scikit_learn import KerasClassifier
from keras_preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras import layers, utils
In [17]:
def plot_history(history, val=0):
    acc = history['accuracy']
    if val == 1:
        val_acc = history['val_accuracy'] # we can add a validation set in our fit function with nn
    loss = history['loss']
    if val == 1:
        val_loss = history['val_loss']
    x = range(1, len(acc) + 1)

    plt.figure(figsize=(12, 5))
    plt.subplot(1, 2, 1)
    plt.plot(x, acc, 'b', label='Training accuracy')
    if val == 1:
        plt.plot(x, val_acc, 'r', label='Validation accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.title('Accuracy')
    plt.legend()
    plt.subplot(1, 2, 2)
    plt.plot(x, loss, 'b', label='Training loss')
    if val == 1:
        plt.plot(x, val_loss, 'r', label='Validation loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.title('Loss')
    plt.legend()
In [18]:
## CREATE MODEL
def create_model(num_filters=64, kernel_size=3, embedding_dim=50, maxlen=100, num_classes=2):
    model = Sequential()
    model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
    model.add(layers.Conv1D(num_filters, kernel_size, activation='relu'))
    model.add(layers.GlobalMaxPooling1D())
    model.add(layers.Dense(10, activation='relu'))
    model.add(layers.Dense(num_classes, activation='sigmoid'))
    model.compile(optimizer='adam',
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

## CLASS FOR PREPROCESSING (needed to work with pipelines)
class preprocessing():
    def __init__(self, num_words=20000, maxlen=100):
        self.maxlen = maxlen
        self.tokenizer = Tokenizer(num_words=num_words)
        
    def fit(self, X, y=None):
        self.tokenizer.fit_on_texts(X)
        return self
    
    def transform(self, X, y=None):
        X_ = self.tokenizer.texts_to_sequences(X)
        return pad_sequences(X_, padding='post', maxlen=self.maxlen) 
 
In [19]:
## PROCESS DATA
X_train, X_test, y_train, y_test = train_test_split(df["text"].values, df["label"].values, test_size=0.33, random_state=42)

# Encode the list of newsgroups into categorical integer values
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
In [20]:
## CREATE PIPELINE
# Use the best parameters in the pipe and fit with the entire dataset
pipe_preproc = Pipeline([
    ("preproc", preprocessing())])

pipe_est = Pipeline([
    ('clf', KerasClassifier(model=create_model,
                        epochs = 10,
                        batch_size=64,
                        verbose=True,
                        num_filters=32 )) #logistic regression
])

pipe_preproc.fit(X_train)
X_train_p = pipe_preproc.transform(X_train)
X_test_p = pipe_preproc.transform(X_test)
vocab_size = len(pipe_preproc["preproc"].tokenizer.word_index) + 1
print(vocab_size)

# test it works
pipe_est.fit(X_train_p[:500], y_train[:500])
12771
Metal device set to: Apple M1
2022-07-28 11:29:52.668947: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:29:52.669244: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Epoch 1/10
2022-07-28 11:29:53.012129: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:29:53.273335: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
8/8 [==============================] - 2s 38ms/step - loss: 0.6804 - accuracy: 0.8100
Epoch 2/10
8/8 [==============================] - 0s 31ms/step - loss: 0.6414 - accuracy: 0.9020
Epoch 3/10
8/8 [==============================] - 0s 30ms/step - loss: 0.5806 - accuracy: 0.9020
Epoch 4/10
8/8 [==============================] - 0s 33ms/step - loss: 0.5087 - accuracy: 0.9020
Epoch 5/10
8/8 [==============================] - 0s 30ms/step - loss: 0.4325 - accuracy: 0.9020
Epoch 6/10
8/8 [==============================] - 0s 28ms/step - loss: 0.3621 - accuracy: 0.9020
Epoch 7/10
8/8 [==============================] - 0s 32ms/step - loss: 0.3229 - accuracy: 0.9020
Epoch 8/10
8/8 [==============================] - 0s 27ms/step - loss: 0.3084 - accuracy: 0.9020
Epoch 9/10
8/8 [==============================] - 0s 28ms/step - loss: 0.3042 - accuracy: 0.9020
Epoch 10/10
8/8 [==============================] - 0s 29ms/step - loss: 0.2971 - accuracy: 0.9020
Out[20]:
Pipeline(steps=[('clf',
                 KerasClassifier(batch_size=64, epochs=10, model=<function create_model at 0x15322edd0>, num_filters=32, verbose=True))])
In [22]:
pipe_est["clf"].model_.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, 100, 50)           638550    
                                                                 
 conv1d (Conv1D)             (None, 98, 32)            4832      
                                                                 
 global_max_pooling1d (Globa  (None, 32)               0         
 lMaxPooling1D)                                                  
                                                                 
 dense (Dense)               (None, 10)                330       
                                                                 
 dense_1 (Dense)             (None, 2)                 22        
                                                                 
=================================================================
Total params: 643,734
Trainable params: 643,734
Non-trainable params: 0
_________________________________________________________________
In [23]:
# I'm having some Apple M1 problems (warnings that are not useful). 
# The code below disables those warnings (usually not a good idea)

#import os
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
In [24]:
## HYPERPARAMETER TUNING
param_grid = dict(clf__model__num_filters=[32, 64, 128],
                  clf__model__kernel_size=[3, 5, 7],                   
                  clf__model__embedding_dim=[50, 100],
                 clf__verbose=[False])

grid = RandomizedSearchCV(estimator=pipe_est, 
                          param_distributions=param_grid,
                          cv=5, 
                          n_jobs=-1,
                          verbose=True,
                         n_iter=10)

grid.fit(X_train_p[:1000], y_train[:1000])
Fitting 5 folds for each of 10 candidates, totalling 50 fits
Metal device set to: Apple M1
Metal device set to: Apple M1
Metal device set to: Apple M1
Metal device set to: Apple M1
Metal device set to: Apple M1
Metal device set to: Apple M1
Metal device set to: Apple M1
2022-07-28 11:31:45.847851: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.847984: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:45.848731: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.848844: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:45.849644: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.849753: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:45.849899: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.850001: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:45.852124: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.852283: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:45.855491: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.855942: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:45.856140: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:45.856443: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:31:46.060452: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:31:46.060765: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: Apple M1
2022-07-28 11:31:46.335480: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.347713: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.350372: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.356936: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.368465: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.369808: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.390346: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:46.842067: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:46.874273: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:47.053004: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:31:47.106584: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:47.107602: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:47.119101: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:47.138170: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:47.158375: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:31:47.585681: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:02.126720: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2022-07-28 11:32:02.670353: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:02.939675: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
/Users/garci061/miniforge3/envs/st/lib/python3.10/site-packages/joblib/externals/loky/process_executor.py:702: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  warnings.warn(
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB


systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Metal device set to: Apple M1
Metal device set to: Apple M1
2022-07-28 11:32:06.509078: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:06.509249: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:06.509310: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:06.509400: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:06.708156: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:32:06.740900: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:32:06.757793: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:06.757918: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Metal device set to: Apple M1
2022-07-28 11:32:06.993914: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:32:07.055541: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:07.064827: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:07.393606: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:08.217012: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2022-07-28 11:32:09.303660: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2022-07-28 11:32:11.398919: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:12.033015: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:12.161756: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Metal device set to: Apple M1
2022-07-28 11:32:12.674039: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:12.674622: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:12.948020: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

2022-07-28 11:32:13.400999: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Metal device set to: Apple M1
2022-07-28 11:32:14.186092: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:14.186227: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:14.419735: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:32:14.734624: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:15.943686: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Metal device set to: Apple M1
2022-07-28 11:32:16.577229: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:16.579177: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:17.301679: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:32:17.435286: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:17.708619: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:17.709237: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:17.880538: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Metal device set to: Apple M1
2022-07-28 11:32:18.058146: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Metal device set to: Apple M1
2022-07-28 11:32:18.397020: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-28 11:32:18.397129: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2022-07-28 11:32:18.426165: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:18.709174: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-07-28 11:32:19.024930: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:24.018898: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:24.440536: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:24.534981: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:25.655891: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:25.837065: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:25.841337: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:29.367721: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:29.836115: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:33.166459: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:33.679122: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:33.875230: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:34.340003: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:36.487021: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:36.905177: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:38.371717: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:38.799789: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:39.146589: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:39.563851: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:41.156706: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:41.677873: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:44.928712: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:45.158643: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:45.383390: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:45.737218: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:47.242610: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:47.878276: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:48.120971: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:48.625425: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:53.768239: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:54.264958: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:56.359200: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:56.896828: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:57.268512: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:32:58.435680: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:01.826507: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:02.956068: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:04.055038: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:04.620879: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:05.669251: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:06.276722: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:12.867255: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:13.304173: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:13.468903: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:13.834674: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:13.933746: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:14.573999: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:17.492334: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:18.025816: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:19.951920: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:20.256229: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:20.920109: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:21.062041: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:21.386512: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:22.672253: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:24.557671: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:25.100871: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:26.272430: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:26.779936: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:28.899507: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:29.370008: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:32.664785: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:34.261283: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:34.981343: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:36.238755: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:36.412568: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:36.833809: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:38.222444: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:40.104480: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:41.608373: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:42.981704: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:44.717852: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:48.821244: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:48.825634: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:49.237277: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-28 11:33:49.848658: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Out[24]:
RandomizedSearchCV(cv=5,
                   estimator=Pipeline(steps=[('clf',
                                              KerasClassifier(batch_size=64, epochs=10, model=<function create_model at 0x15322edd0>, num_filters=32, verbose=True))]),
                   n_jobs=-1,
                   param_distributions={'clf__model__embedding_dim': [50, 100],
                                        'clf__model__kernel_size': [3, 5, 7],
                                        'clf__model__num_filters': [32, 64,
                                                                    128],
                                        'clf__verbose': [False]},
                   verbose=True)
In [25]:
print(grid.best_score_)
print(grid.best_params_)
0.892
{'clf__verbose': False, 'clf__model__num_filters': 32, 'clf__model__kernel_size': 3, 'clf__model__embedding_dim': 100}
In [26]:
# Use the best parameters in the pipe and fit with the entire dataset
clf_best = grid.best_estimator_
clf_best = pipe_est.fit(X_train_p, y_train,
                   clf__validation_data=(X_test_p, y_test))
Epoch 1/10
  1/113 [..............................] - ETA: 1:19 - loss: 0.6961 - accuracy: 0.3594
2022-07-28 11:33:56.102679: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
113/113 [==============================] - ETA: 0s - loss: 0.4541 - accuracy: 0.8780
2022-07-28 11:34:00.328055: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
113/113 [==============================] - 5s 39ms/step - loss: 0.4541 - accuracy: 0.8780 - val_loss: 0.3386 - val_accuracy: 0.8933
Epoch 2/10
113/113 [==============================] - 4s 32ms/step - loss: 0.3388 - accuracy: 0.8858 - val_loss: 0.3082 - val_accuracy: 0.8933
Epoch 3/10
113/113 [==============================] - 3s 30ms/step - loss: 0.2478 - accuracy: 0.8939 - val_loss: 0.2470 - val_accuracy: 0.9077
Epoch 4/10
113/113 [==============================] - 3s 30ms/step - loss: 0.1348 - accuracy: 0.9511 - val_loss: 0.2474 - val_accuracy: 0.9080
Epoch 5/10
113/113 [==============================] - 3s 31ms/step - loss: 0.0646 - accuracy: 0.9810 - val_loss: 0.2752 - val_accuracy: 0.9043
Epoch 6/10
113/113 [==============================] - 4s 31ms/step - loss: 0.0303 - accuracy: 0.9940 - val_loss: 0.3128 - val_accuracy: 0.9049
Epoch 7/10
113/113 [==============================] - 4s 31ms/step - loss: 0.0147 - accuracy: 0.9980 - val_loss: 0.3420 - val_accuracy: 0.9032
Epoch 8/10
113/113 [==============================] - 3s 31ms/step - loss: 0.0075 - accuracy: 0.9993 - val_loss: 0.3848 - val_accuracy: 0.8998
Epoch 9/10
113/113 [==============================] - 3s 30ms/step - loss: 0.0035 - accuracy: 0.9996 - val_loss: 0.4301 - val_accuracy: 0.9052
Epoch 10/10
113/113 [==============================] - 3s 31ms/step - loss: 0.0017 - accuracy: 1.0000 - val_loss: 0.4402 - val_accuracy: 0.8989
In [27]:
import pylab as plt
plot_history(clf_best["clf"].history_, val=True)
In [28]:
# Use the best epochs
clf_best = grid.best_estimator_
clf_best = pipe_est.fit(X_train_p, y_train,
                   clf__validation_data=(X_test_p, y_test),
                   clf__epochs=4)
Epoch 1/4
  1/113 [..............................] - ETA: 59s - loss: 0.6953 - accuracy: 0.3281
2022-07-28 11:34:32.792323: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
113/113 [==============================] - ETA: 0s - loss: 0.4559 - accuracy: 0.8778
2022-07-28 11:34:36.634564: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
113/113 [==============================] - 4s 35ms/step - loss: 0.4559 - accuracy: 0.8778 - val_loss: 0.3385 - val_accuracy: 0.8933
Epoch 2/4
113/113 [==============================] - 4s 31ms/step - loss: 0.3390 - accuracy: 0.8858 - val_loss: 0.3052 - val_accuracy: 0.8933
Epoch 3/4
113/113 [==============================] - 3s 31ms/step - loss: 0.2508 - accuracy: 0.8944 - val_loss: 0.2471 - val_accuracy: 0.9029
Epoch 4/4
113/113 [==============================] - 3s 31ms/step - loss: 0.1373 - accuracy: 0.9449 - val_loss: 0.2531 - val_accuracy: 0.9066
In [29]:
# Find some extreme examples
less_fake = "- YouTube"
most_fake = "Look what happens when Whites leave black Countries alone to do what they do naturally The blacks in White Countries today should be on their knees thanking Whites for trying to civilize them"
pred_fake_not_fake = "Too many whites think they deserve what negroes dish out because of guilt ."
pred_not_fake_fake = "https://www.stormfront.org/forum/t1020784/ https : //www.stormfront.org/forum/t102 ... ghlight = sweden https : //www.stormfront.org/forum/t102 ... ghlight = sweden https : //www.stormfront.org/forum/t101 ... ghlight = sweden https : //www.stormfront.org/forum/t101 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden God save them ....."
pred_50_50 = "She says the class is out of control and the kids are unteachable , and the black administration does not support her "


    
print("Least fake: ", less_fake)
print("Most fake: ", most_fake)
print("Predicted very fake but not fake: ", pred_fake_not_fake)
print("Predicted very true but fake: ", pred_not_fake_fake)
print("Predicted 50/50: ", pred_50_50)
Least fake:  - YouTube
Most fake:  Look what happens when Whites leave black Countries alone to do what they do naturally The blacks in White Countries today should be on their knees thanking Whites for trying to civilize them
Predicted very fake but not fake:  Too many whites think they deserve what negroes dish out because of guilt .
Predicted very true but fake:  https://www.stormfront.org/forum/t1020784/ https : //www.stormfront.org/forum/t102 ... ghlight = sweden https : //www.stormfront.org/forum/t102 ... ghlight = sweden https : //www.stormfront.org/forum/t101 ... ghlight = sweden https : //www.stormfront.org/forum/t101 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden https : //www.stormfront.org/forum/t100 ... ghlight = sweden God save them .....
Predicted 50/50:  She says the class is out of control and the kids are unteachable , and the black administration does not support her 
In [30]:
# start the explainer
explainer = LimeTextExplainer(class_names = ["Innocuous", "Hate"], bow=False)

# relying on global objects (not too nice)
def create_proba(text):
    t = pipe_preproc.transform(text)
    return clf_best.predict_proba(t)


# shows the explanation for our example instances
for text in [less_fake, most_fake, pred_fake_not_fake, pred_not_fake_fake, pred_50_50]:
    exp = explainer.explain_instance(text, 
                                     create_proba, 
                                     num_features = 10,
                                    num_samples = 1000)
    exp.show_in_notebook(text=text)
    print(exp.as_list())
    print("-"*100)
16/16 [==============================] - 0s 3ms/step
2022-07-28 11:34:47.453158: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
[('YouTube', -0.006048106422461584)]
----------------------------------------------------------------------------------------------------
16/16 [==============================] - 0s 2ms/step
[('black', 0.17564516591637236), ('them', 0.1730713502056678), ('Countries', 0.1072953338390616), ('Countries', 0.09632486012554035), ('blacks', 0.09560279684615887), ('leave', 0.0928889687572245), ('they', 0.07251502736442184), ('in', -0.06562443216546078), ('do', -0.06013481985526459), ('civilize', 0.049005462434422664)]
----------------------------------------------------------------------------------------------------
16/16 [==============================] - 0s 2ms/step
[('negroes', 0.5407839683690124), ('guilt', 0.0996798238726905), ('they', 0.09083990001563409), ('whites', 0.06668847726330036), ('Too', 0.04226610782525202), ('deserve', 0.041455030277885906), ('many', 0.039044909904479536), ('think', -0.018852046466716968), ('of', 0.014071091668857698), ('out', 0.010663000654574178)]
----------------------------------------------------------------------------------------------------
16/16 [==============================] - 0s 2ms/step
[('God', 0.019535086693903983), ('them', 0.018414271795680722), ('sweden', 0.011708416297367371), ('save', 0.006463289614638867), ('org', 0.002587474461743947), ('sweden', 0.002338195193671646), ('stormfront', -0.0015784559800041808), ('stormfront', -0.001430015586189894), ('stormfront', -0.000956935802031727), ('www', -0.0009278344170942581)]
----------------------------------------------------------------------------------------------------
16/16 [==============================] - 0s 2ms/step
[('black', 0.12659400083549735), ('her', -0.07619141311656674), ('the', 0.06516765533330464), ('support', 0.04171327247029013), ('administration', -0.0394739844532431), ('does', 0.034507430097908666), ('are', 0.013991215055108124), ('and', 0.011879212101886829), ('not', -0.010455288771714415), ('class', -0.010026869999929328)]
----------------------------------------------------------------------------------------------------
In [31]:
exp = explainer.explain_instance("I believe Dutch people have inferior food and they should be colonized by Belgium", 
                                 create_proba, 
                                 num_features = 10,
                                num_samples = 1000)
exp.show_in_notebook(text=text)
print(exp.as_list())
print("-"*100)
16/16 [==============================] - 0s 2ms/step
[('they', 0.06445486105964748), ('be', 0.029905304827850636), ('Belgium', -0.010661036961754007), ('food', -0.0063249991623368085), ('inferior', 0.005178134157707648), ('I', -0.003084906335237635), ('have', -0.0030618668047911325), ('by', 0.0015078686104735845), ('colonized', 0.0014220388819078382), ('Dutch', -0.0012953540318621458)]
----------------------------------------------------------------------------------------------------
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: